runZinb <- T
runClus <- T
NCORES <- 7
FP: Version of zinbwave to use is last version of branch normvalues (branched from master). Parallel computing is now handle by BiocParallel. If you have a Windows machine, please update the code to allow paralell computing.
mysystem = Sys.info()[['sysname']]
switch(mysystem,
Windows = {print("I'm a Windows PC and you should choose the parallel computing you want.")},
Linux = {print("I'm a penguin and I'll use package MulticoreParam for parallel computing.")},
Darwin = {print("I'm a Mac and I'll use package doParallel for parallel computing.")})
## [1] "I'm a Mac and I'll use package doParallel for parallel computing."
if (mysystem == 'Darwin'){
registerDoParallel(NCORES)
register(DoparParam())
}else if (mysystem == 'Linux'){
register(bpstart(MulticoreParam(workers=NCORES)))
}else{
print('Please change this to allow parallel computing')
register(SerialParam())
}
EAP: small request. Can everyone put a line between the beginning of a r chunk and text? It makes it nicely formated for my text editor.
We propose a worklow to analyze single cell RNA-Seq with the following steps
Along the worflow, use deviance residuals as adjusted values.
knitr::include_graphics('../vignettes/workflow.png')
Worflow to analyze single cell RNASeq data
Along the workflow, we want to use a unique SummarizedExperiement object carrying all the data we need.
EAP: I have updated the code to pull from a dataset on the repos that is created with the createData.R file. For now, I am filtering to the top 1000 most variable genes there, though we might want to add that to the code for the article. This will be slightly different data from Russell’s, which didn’t use all of the samples. We can adjust that decision later, or just compare the samples that are the same. (Russell’s clusterLabels are in the meta data)
EAP: zinbFit doesn’t accept data.frame objects, so currently have to have a data.matrix command. Should it be changed so that it does?
#counts<-read.table("../data/oeCufflinkCountData.txt",sep="\t",header=TRUE)
core <- read.table("../data/oeCufflinkCountData_1000Var.txt",
sep = "\t", header = TRUE)
core <- data.matrix(core)
metadata <- read.table("../data/oeMetadata.txt", sep = "\t", header = TRUE)
# symbol for samples missing from original clustering
metadata$clusterLabels[is.na(metadata$clusterLabels)] <- -2
Here we only look at the 1000 most variable genes. EAP: see note above, I’ve commented out the filtering and added it to the createData.R for now.
batch <- metadata$Batch
Cells have been processed in 18 different batches
col_batch = rep(brewer.pal(9, "Set1"), 2)
names(col_batch) = unique(batch)
table(batch)
## batch
## GBC08A GBC08B GBC09A GBC09B P01 P02 P03A P03B P04 P05
## 41 47 43 38 34 49 73 52 24 23
## P06 P10 P11 P12 P13 P14 Y01 Y04
## 53 51 54 51 60 49 65 42
We have qc measures from the data
qc <- metadata[, !names(metadata) %in% c("Batch", "Experiment", "clusterLabels")]
head(qc, 2)
## NREADS NALIGNED RALIGN TOTAL_DUP PRIMER
## OEP01_N706_S501 3313260 3167600 95.6035 47.9943 0.0154566
## OEP01_N701_S501 2902430 2757790 95.0167 45.0150 0.0182066
## PCT_RIBOSOMAL_BASES PCT_CODING_BASES PCT_UTR_BASES
## OEP01_N706_S501 2e-06 0.200130 0.230654
## OEP01_N701_S501 0e+00 0.182461 0.201810
## PCT_INTRONIC_BASES PCT_INTERGENIC_BASES PCT_MRNA_BASES
## OEP01_N706_S501 0.404205 0.165009 0.430784
## OEP01_N701_S501 0.465702 0.150027 0.384271
## MEDIAN_CV_COVERAGE MEDIAN_5PRIME_BIAS MEDIAN_3PRIME_BIAS
## OEP01_N706_S501 0.843857 0.061028 0.521079
## OEP01_N701_S501 0.914370 0.033350 0.373993
## CreER ERCC_reads
## OEP01_N706_S501 1 10516
## OEP01_N701_S501 3022 9331
clus.labels <- metadata[, "clusterLabels"]
In original work (FP: add ref), cells have been clustered into 14 different clusters
col_clus <- c("transparent", brewer.pal(12, "Set3"), brewer.pal(8, "Set2"))
col_clus <- col_clus[1:length(unique(clus.labels))]
names(col_clus) <- sort(unique(clus.labels))
table(clus.labels)
## clus.labels
## -2 1 2 3 4 5 7 8 9 10 11 12 14 15
## 233 91 25 56 40 96 60 28 79 26 22 35 26 32
Batches are kind of confounded with the biology
table(data.frame(batch = as.vector(batch),
cluster = clus.labels))
## cluster
## batch -2 1 2 3 4 5 7 8 9 10 11 12 14 15
## GBC08A 5 0 2 12 9 0 0 0 0 0 2 0 2 9
## GBC08B 14 0 7 5 3 0 0 0 1 2 4 0 5 6
## GBC09A 13 0 1 5 9 0 0 0 1 1 0 0 6 7
## GBC09B 21 0 2 2 7 0 0 0 3 0 0 0 3 0
## P01 9 0 2 4 3 15 1 0 0 0 0 0 0 0
## P02 6 2 0 9 3 15 3 3 2 3 0 2 1 0
## P03A 36 3 0 3 0 12 2 9 4 2 0 2 0 0
## P03B 19 1 2 1 1 11 1 2 10 1 1 2 0 0
## P04 10 0 0 0 0 11 1 0 1 1 0 0 0 0
## P05 3 0 0 0 1 11 3 0 1 0 2 2 0 0
## P06 14 1 2 3 0 8 2 4 8 4 1 2 2 2
## P10 15 3 1 4 0 4 5 9 2 0 2 5 0 1
## P11 10 2 1 1 0 1 5 1 22 3 1 6 0 1
## P12 11 0 2 0 0 4 10 0 8 2 3 6 4 1
## P13 13 1 2 4 0 4 15 0 4 5 6 1 3 2
## P14 10 0 0 1 2 0 12 0 12 2 0 7 0 3
## Y01 14 47 1 1 2 0 0 0 0 0 0 0 0 0
## Y04 10 31 0 1 0 0 0 0 0 0 0 0 0 0
We have 849 cells.
dim(core)
## [1] 1000 849
core[1:3, 1:3]
## OEP01_N706_S501 OEP01_N701_S501 OEP01_N707_S507
## Cbr2 5799 3638 1448
## Cyp2f2 2158 2027 1078
## Gstm1 8763 7221 3581
Let’s create a SummarizedExperiment object to store the raw counts and information about the data, that is batches, original labels, and quality control measures.
se <- SummarizedExperiment(assays = list(rawCounts = core),
colData = metadata)
To cluster and get lineages we want to reduce the dimension of the data. We are going to use zinbwave to do so. First, let’s fit zinbwave with first K = 0 to compute normalized values (i.e. deviance residuals) adjusted for batches. We could also adjust for gene length or GC content here. We then fit zinbwave to get the dimensionality reduced matrix W specifying the number of dimension K = 50. Eventually, we will call zinbwave just once where we would have an argument in zinbFit like “compute_normalized_values” in c(TRUE, FALSE). For K = 0 and K = 50, we correct for batch effect including batches in X.
fn <- '../data/zinb_batch.rda'
if (runZinb & !file.exists(fn)){
print(system.time(se <- zinbDimRed(se, K = 50, X = '~ Batch',
residuals = T,
normalizedValues = T)))
save(se, file = fn)
}else{
load(fn)
}
We use deviance residuals as normalized values for visualization. FP: explain rational: K=0 so residuals capture the bio adjusting for batch. Let’s check that deviance residuals look ok.
FP: note to myself: why do we have infinite values in the residuals now? It shows the same results as before, but we should not see infinite values here!
norm <- assays(se)$normalizedValues
if (sum(is.infinite(norm))>0){
maxNorm = max(norm[!is.infinite(norm)])
assays(se)$normalizedValues[is.infinite(norm)] <- maxNorm
norm <- assays(se)$normalizedValues
}
norm[1:3,1:3]
## OEP01_N706_S501 OEP01_N701_S501 OEP01_N707_S507
## Cbr2 4.533210 4.365844 -4.136863
## Cyp2f2 4.355783 4.321493 4.117663
## Gstm1 4.728341 4.624404 4.400054
Boxplot of the normalized values for each cell. It seems that correction for batches is ok.
norm_order <- norm[, order(as.numeric(batch))]
col_order <- as.numeric(batch)[order(as.numeric(batch))]
boxplot(norm_order, main='Boxplot of normalized values\ncolor=batch',
col = col_order, staplewex = 0, outline = 0, border = col_order,
xaxt = 'n')
PCA on the normalized values where color are for batches on the left and previously found clusters on the right. We want no clustering on the left side and clustering on the right side.
pca <- prcomp(t(norm))
par(mfrow = c(1,2))
plot(pca$x, col = col_batch[batch], pch = 20,
main="PCA of normalized values\ncolor=batch")
plot(pca$x, col = col_clus[as.character(clus.labels)], pch = 20,
main = "PCA of normalized values\ncolor=cluster")
par(mfrow = c(1,1))
Let’s check that performing MDS on W we have something coherent with original clusters.
W <- colData(se)[, grepl('^W', colnames(colData(se)))]
W <- as.matrix(W)
d <- dist(W)
fit <- cmdscale(d, eig = TRUE, k = 2)
plot(fit$points, col = col_clus[as.character(clus.labels)], main = 'MDS', pch = 20,
xlab = 'Component 1', ylab = 'Component 2')
legend(x = 'bottomright', legend = unique(names(col_clus)), cex = .5,
fill = unique(col_clus), title = 'Sample')
We use clusterExperiment with W.
EP: I updated it to work on a SE object so that it has the meta data. If you have a summarized experiment object with W already, you could use that as long as assay(seObj) gives W.
fn <- '../data/RSEC_W.rda'
if (runClus & !file.exists(fn)){
#symbol for samples missing from original clustering
metadata$clusterLabels[is.na(metadata$clusterLabels)] <- -2
seObj <- SummarizedExperiment(t(W), colData = metadata)
print(system.time(ceObj <- RSEC(seObj, k0s = 4:15, alphas = c(0.1),
betas = 0.8,
clusterFunction = "hierarchical01", minSizes=1,
ncores = NCORES, isCount=FALSE,
subsampleArgs = list(resamp.num=100,
clusterFunction="kmeans",
clusterArgs=list(nstart=10)),
seqArgs = list(k.min=3, top.can=5), verbose=TRUE,
combineProportion = 0.7,
mergeMethod = "none")))
save(ceObj, file = fn)
}else{
load(fn)
}
plotClusters(ceObj, colPalette = c(bigPalette, rainbow(199)))
plotCoClustering(ceObj)
## Warning in .makeColors(clusters, colors = bigPalette): too many clusters to
## have unique color assignments
table(primaryClusterNamed(ceObj))
##
## -1 c1 c10 c2 c3 c4 c5 c6 c7 c8 c9
## 329 125 36 9 89 94 13 96 5 48 5
sum(primaryCluster(ceObj) == -1)
## [1] 329
FP: Elizabeth, we are working with the W here, does the locfdr make sense in this context? I set eval=FALSE in the next chunk to skip the merging step, let me know if you would rather keep using it. And if we want to still use the merging step, would we want to include it in RSEC function arguments instead of separately?
EP: I don’t think the merging step on the W makes a whole lot of sense – the method is irrelevant. The merging is based on calculating the % of genes found significant (the specific method is arbitrary). The best thing would be to replace the W with residuals in the assay of ceObj (or whatever data that you will do the DE on for the time stuff below), and then run the merging step on that data. I’m not particularly fond of locfdr. It was probably the method that gave the best merging to Russell and Diya. You’d really have to run mergeClusters setting plotInfo="all" and look at the results and decide both the cutoff level and the method.
EP: Also, if you don’t save the output of mergeClusters it doesn’t update ceObj. I was calling it for just the resulting plots, since it was already merged in RSEC above. I’ve changed to code to update ceObj below.
FP: Ha ok, good to know. I’ll keep the eval=FALSE for the moment.
#re-does merging simpling to make plot
#something like:
#assay(ceObj)
# if that replacement data should be considered on the transformed scale in plots, etc, the transformation function should be fixed as well:
#transformation(ceObj)
ceObj<-mergeClusters(ceObj, mergeMethod = "locfdr",
plotInfo = "mergeMethod", cutoff = 0.01)
So, let’s look at a heatmap on normalized values.
FP: Elizabeth, I did not find how to define the column annotation track in the plot below to have the same colors as in ceObj@clusterLegend[[1]]. I tried to use arguments annColors and annCol from aheatmap as it is said in plotHeatmap documentation that for signature matrix arguments can be passed to aheatmap. But I got the error “The following arguments to aheatmap cannot be set by the user in plotHeatmap:Rowv,Colv,color,annCol,annColors”.
EP: Fanny, you would need to use the argument ‘clusterLegend’. That argument takes either the format of aheatmap (list with each element a named vector of colors) or the format of the clusterExperiment object (i.e. list with each element a matrix with columns for name and color). So I think the following code will run, though it might need the list to have names…
But an easier fix to the code would be to set visualizeData option. I haven’t tested this because I don’t have the objects need run, so let me know if there is error.
FP: it seems great to me, what do you think?
EP: We should be careful, because the default in plotHeatmap is to plot the 500 most variable genes (maybe a slightly paternalistic default). I’ve changed it to all in the code here. I’ve also added the plotting of the batch, experiment, and Russell’s original clusters. We may not want to keep all of them, but probably at least Russell’s clusters for comparison.
# sampleData <- data.frame(ours = primaryCluster(ceObj))
# plotHeatmap(assays(se)$normalizedValues,
# main = 'Normalized values, 1000 most variable genes',
# clusterSamplesData = ceObj@dendro_samples,
# sampleData = as.matrix(sampleData),clusterLegend=ceObj@clusterLegend[1])
# easier fix:
origClusterColors<-bigPalette[1:nlevels(colData(ceObj)$clusterLabels)]
experimentColors<-bigPalette[1:nlevels(colData(ceObj)$Experiment)]
batchColors<-bigPalette[1:nlevels(colData(ceObj)$Batch)]
metaColors<-list("Experiment"=experimentColors,"Batch"=batchColors,"clusterLabels"=origClusterColors)
plotHeatmap(ceObj, visualizeData = assays(se)$normalizedValues,
whichClusters = "primary",clusterFeaturesData="all",
clusterSamplesData = "dendrogramValue",
sampleData=c("clusterLabels","Batch","Experiment"),
clusterLegend=metaColors, annLegend=FALSE,
main = 'Normalized values, 1000 most variable genes',
breaks = 0.99)
plot(fit$points, col = col_clus[as.character(clus.labels)],
main = 'MDS W, color = original clusters', pch = 20,
xlab = 'Component1', ylab = 'Component2')
legend(x = 'bottomright', legend = unique(names(col_clus)), cex = .5,
fill = unique(col_clus), title = 'Sample')
palDF <- ceObj@clusterLegend[[1]]
pal <- palDF[, 'color']
names(pal) <- palDF[, 'name']
pal["-1"] = "transparent"
plot(fit$points, col = pal[primaryClusterNamed(ceObj)],
main = 'MDS W, color = our new clusters', pch = 20,
xlab = 'Component1', ylab = 'Component2')
legend(x = 'bottomright', legend = names(pal), cex = .5,
fill = pal, title = 'Sample')
The goal of this section is to see if we need to refit zinbwave when we want to run slingshot. We first run slingshot on the W used by clusterExperiment. In the second part of this section, we fit zinbwave on the matrix of counts where the unassigned cells have been removed. For each part (without or with refitting zinbwave), we run slingshot in the supervised and unsupervised mode and try k=3, k=4, k=5 dimensions in W.
From what I understand, start original clusters are 1 and 5 (HBC) and end original clusters are 15 (Microvillus), 9 and 12 (neuron), and 4, 7 (Sus). Additionally, we want the GBC cluster to be a junction before the differentiation between Microvillus and Neuron. The correspondance with the original clusters is as follow
table(data.frame(original = clus.labels, ours = primaryClusterNamed(ceObj)))
## ours
## original -1 c1 c10 c2 c3 c4 c5 c6 c7 c8 c9
## -2 126 36 6 4 25 7 12 5 5 5 2
## 1 47 43 0 1 0 0 0 0 0 0 0
## 2 2 1 0 0 0 22 0 0 0 0 0
## 3 2 3 0 0 1 49 1 0 0 0 0
## 4 16 2 0 0 22 0 0 0 0 0 0
## 5 53 38 0 4 1 0 0 0 0 0 0
## 7 21 0 0 0 39 0 0 0 0 0 0
## 8 27 1 0 0 0 0 0 0 0 0 0
## 9 15 0 0 0 1 0 0 57 0 3 3
## 10 2 0 0 0 0 0 0 0 0 24 0
## 11 6 1 0 0 0 15 0 0 0 0 0
## 12 1 0 0 0 0 0 0 34 0 0 0
## 14 9 0 0 0 0 1 0 0 0 16 0
## 15 2 0 30 0 0 0 0 0 0 0 0
| Cluster name | Description | Correspondence |
|---|---|---|
| c1 | HBC | original 1, 5 |
| c2 | new and small | new and small |
| c3 | new and small | new and small |
| c4 | GBC / immature neurons / MV 1 | original 2, 3, 11, 14 |
| c5 | Sus | original 4, 7 |
| c6 | Neuron | original 9, 12 |
| c7 | Immature Neuron | original 10, 14 |
| c8 | Immature Neuron | original 14 |
| c9 | Microvillus | original 15 |
Kvec <- c(3, 4, 5)
The input of slingshot is the W used for clusterExperiment where the number of dimensions is reduced to k where k in (3, 4, 5) here.
K = 3 does not seem very good to me: Sus is not an end cluster, GBC is an end cluster.
K = 4 is better, slingshot finds the end clusters but there is a spurious end cluster.
K = 5 does not seem great to me: GBC is an end cluster and Sus and Microvillus are in the same lineage.
our_cl <- primaryClusterNamed(ceObj)
cl = our_cl[our_cl != "-1"]
pal = pal[names(pal) != '-1']
for (k in Kvec){
X <- W[our_cl != "-1", 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1")
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c2" "c7" "c3" "c10" "c4" "c8" "c6"
## [1] "c1" "c2" "c7" "c3" "c10" "c4" "c8" "c9"
## [1] "c1" "c5"
## NULL
## NULL
## [1] "K=4"
## [1] "c1" "c5" "c4" "c8" "c6"
## [1] "c1" "c5" "c4" "c8" "c9"
## [1] "c1" "c2" "c7" "c3"
## [1] "c1" "c5" "c4" "c10"
## NULL
## [1] "K=5"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c2" "c7" "c3" "c10"
## NULL
## NULL
## NULL
K = 3 finds GBC as an end cluster (that I did not specify in the end.clus!).
K = 4 Yeah! it seems that it is what we want even if we still have a spurius end cluster and GBC not really at the junction.
K = 5 Yeah! even if GBC not really at the junction.
for (k in Kvec){
X <- W[our_cl != "-1", 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1",
end.clus = c("c3", "c6", "c10"))
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c5" "c4" "c8" "c6"
## [1] "c1" "c5" "c4" "c8" "c9"
## [1] "c1" "c2" "c7" "c3"
## [1] "c1" "c5" "c4" "c10"
## NULL
## [1] "K=4"
## [1] "c1" "c5" "c4" "c8" "c6"
## [1] "c1" "c5" "c4" "c8" "c9"
## [1] "c1" "c2" "c7" "c3"
## [1] "c1" "c5" "c4" "c10"
## NULL
## [1] "K=5"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c2" "c7" "c3"
## [1] "c1" "c5" "c4" "c10"
## NULL
## NULL
K = 3 is better than when we did not refit zinbwave but still not perfect: Sus in all the clusters. GBC not really at the junction.
K = 4 good even if GBC not really at the junction.
K = 5 not great GBC is an end cluster.
fn <- '../data/refit_zinbwave_slingshot.rda'
if (runZinb & !file.exists(fn)){
zinbList <- lapply(Kvec, function(k){
zinbFit(se[, our_cl != "-1"], X = '~ Batch', K = k)
})
save(zinbList, file = fn)
}else{
load(fn)
}
for(k in Kvec) {
X <- getW(zinbList[[k - 2]])[, 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1")
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c2" "c3" "c7"
## [1] "c1" "c5" "c4" "c10"
## NULL
## NULL
## [1] "K=4"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c5" "c4" "c10"
## [1] "c1" "c2"
## [1] "c1" "c3"
## [1] "c1" "c7"
## [1] "K=5"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c2" "c3"
## [1] "c1" "c7" "c10"
## NULL
## NULL
K = 3 Yeah! close to perfection.
K = 4 good even if GBC not really at the junction.
K = 5 no, GBC is an end cluster.
for(k in Kvec){
X <- getW(zinbList[[k - 2]])[, 1:k]
lineages <- get_lineages(X, clus.labels = cl, start.clus = "c1",
end.clus = c("c3", "c6", "c10"))
curves <- get_curves(X, clus.labels = cl, lineages = lineages)
plot_curves(X, cl, curves, col.clus = pal)
plot_tree(X, cl, lineages, col.clus = pal)
print(paste0("K=", k))
print(lineages$lineage1)
print(lineages$lineage2)
print(lineages$lineage3)
print(lineages$lineage4)
print(lineages$lineage5)
}
## [1] "K=3"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c5" "c4" "c10"
## [1] "c1" "c2" "c3"
## [1] "c1" "c7"
## NULL
## [1] "K=4"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c5" "c4" "c10"
## [1] "c1" "c2"
## [1] "c1" "c3"
## [1] "c1" "c7"
## [1] "K=5"
## [1] "c1" "c5" "c4" "c8" "c9" "c6"
## [1] "c1" "c2" "c3"
## [1] "c1" "c7" "c10"
## NULL
## NULL
CONCLUSION: K = 5 is never great as GBC is generally an end cluster. K = 4 is ok for all the methods and a bit better when zinbwave is refitted. K = 3 when refitting and supervized is good.
It seems to me that using slingshot on W without re-fitting zinbwave with k = 4 gives good results where supervized mode is slightly better than unsupervized. It is just a one shot example and we should obviously not make a general conclusion, but I think that for the purpose of the workflow it is fine to use slingshot without refitting zinbwave. We should write a note to the user that it is better to refit zinbwave to have more power.
Here is the kind of plots we want to present
de <- read.csv('../data/oe_markers.txt', stringsAsFactors = F, header = F)
de <- de$V1
plotHeatmap(ceObj,
visualizeData = assays(se)$normalizedValues[rownames(se) %in% de, ],
clusterSamplesData = "dendrogramValue",
whichClusters = "primary",
main = 'Normalized values, 1000 most variable genes',
breaks = 0.99)
FP: Kelly, is it you who did the DE analysis for Russell paper? If yes, what tool did you use? On what data? The full quantile normalized counts? Do you have code available?
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] rARPACK_0.11-0 digest_0.6.12
## [3] RColorBrewer_1.1-2 Rtsne_0.13
## [5] magrittr_1.5 gplots_3.0.1
## [7] ggplot2_2.2.1 doParallel_1.0.10
## [9] iterators_1.0.8 foreach_1.4.3
## [11] slingshot_0.0.3-5 princurve_1.1-12
## [13] zinbwave_0.99.4.2 BiocParallel_1.10.1
## [15] clusterExperiment_1.3.0-9009 scone_1.0.0
## [17] scRNAseq_1.2.0 SummarizedExperiment_1.6.3
## [19] DelayedArray_0.2.4 matrixStats_0.52.2
## [21] Biobase_2.36.2 GenomicRanges_1.28.3
## [23] GenomeInfoDb_1.12.1 IRanges_2.10.2
## [25] S4Vectors_0.14.2 BiocGenerics_0.22.0
##
## loaded via a namespace (and not attached):
## [1] shinydashboard_0.6.0 R.utils_2.5.0
## [3] RSQLite_1.1-2 AnnotationDbi_1.38.0
## [5] htmlwidgets_0.8 grid_3.4.0
## [7] trimcluster_0.1-2 RNeXML_2.0.7
## [9] DESeq_1.28.0 munsell_0.4.3
## [11] codetools_0.2-15 statmod_1.4.29
## [13] scran_1.4.4 DT_0.2
## [15] miniUI_0.1.1 colorspace_1.3-2
## [17] energy_1.7-0 knitr_1.16
## [19] uuid_0.1-2 pspline_1.0-17
## [21] robustbase_0.92-7 bayesm_3.0-2
## [23] NMF_0.20.6 tximport_1.4.0
## [25] GenomeInfoDbData_0.99.0 hwriter_1.3.2
## [27] rhdf5_2.20.0 rprojroot_1.2
## [29] EDASeq_2.10.0 diptest_0.75-7
## [31] R6_2.2.1 ggbeeswarm_0.5.3
## [33] taxize_0.8.4 locfit_1.5-9.1
## [35] flexmix_2.3-14 bitops_1.0-6
## [37] reshape_0.8.6 assertthat_0.2.0
## [39] scales_0.4.1 nnet_7.3-12
## [41] beeswarm_0.2.3 gtable_0.2.0
## [43] phylobase_0.8.4 RUVSeq_1.10.0
## [45] bold_0.4.0 rlang_0.1.1
## [47] genefilter_1.58.1 splines_3.4.0
## [49] rtracklayer_1.36.3 lazyeval_0.2.0
## [51] hexbin_1.27.1 rgl_0.98.1
## [53] yaml_2.1.14 reshape2_1.4.2
## [55] abind_1.4-5 GenomicFeatures_1.28.1
## [57] backports_1.1.0 httpuv_1.3.3
## [59] tensorA_0.36 tools_3.4.0
## [61] gridBase_0.4-7 stabledist_0.7-1
## [63] dynamicTreeCut_1.63-1 Rcpp_0.12.11
## [65] plyr_1.8.4 visNetwork_1.0.3
## [67] progress_1.1.2 zlibbioc_1.22.0
## [69] purrr_0.2.2.2 RCurl_1.95-4.8
## [71] prettyunits_1.0.2 viridis_0.4.0
## [73] zoo_1.8-0 cluster_2.0.6
## [75] data.table_1.10.4 RSpectra_0.12-0
## [77] mvtnorm_1.0-6 whisker_0.3-2
## [79] gsl_1.9-10.3 aroma.light_3.6.0
## [81] mime_0.5 evaluate_0.10
## [83] xtable_1.8-2 XML_3.98-1.7
## [85] mclust_5.3 gridExtra_2.2.1
## [87] compiler_3.4.0 biomaRt_2.32.0
## [89] scater_1.4.0 tibble_1.3.3
## [91] KernSmooth_2.23-15 R.oo_1.21.0
## [93] htmltools_0.3.6 pcaPP_1.9-61
## [95] segmented_0.5-2.0 tidyr_0.6.3
## [97] geneplotter_1.54.0 howmany_0.3-1
## [99] DBI_0.6-1 MASS_7.3-47
## [101] fpc_2.1-10 MAST_1.2.1
## [103] boot_1.3-19 compositions_1.40-1
## [105] ShortRead_1.34.0 Matrix_1.2-10
## [107] ade4_1.7-6 R.methodsS3_1.7.1
## [109] gdata_2.17.0 igraph_1.0.1
## [111] rncl_0.8.2 GenomicAlignments_1.12.1
## [113] registry_0.3 numDeriv_2016.8-1
## [115] locfdr_1.1-8 plotly_4.7.0
## [117] xml2_1.1.1 annotate_1.54.0
## [119] vipor_0.4.5 rngtools_1.2.4
## [121] pkgmaker_0.22 XVector_0.16.0
## [123] stringr_1.2.0 copula_0.999-16
## [125] ADGofTest_0.3 softImpute_1.4
## [127] Biostrings_2.44.0 rmarkdown_1.5
## [129] dendextend_1.5.2 edgeR_3.18.1
## [131] kernlab_0.9-25 shiny_1.0.3
## [133] Rsamtools_1.28.0 gtools_3.5.0
## [135] modeltools_0.2-21 rjson_0.2.15
## [137] nlme_3.1-131 jsonlite_1.4
## [139] viridisLite_0.2.0 limma_3.32.2
## [141] lattice_0.20-35 httr_1.2.1
## [143] DEoptimR_1.0-8 survival_2.41-3
## [145] FNN_1.1 prabclus_2.2-6
## [147] glmnet_2.0-10 class_7.3-14
## [149] stringi_1.1.5 mixtools_1.1.0
## [151] latticeExtra_0.6-28 caTools_1.17.1
## [153] memoise_1.1.0 dplyr_0.5.0
## [155] ape_4.1